Morphological Analysis and Generation of Arabic Nouns: A Morphemic Functional Approach

نویسندگان

  • Mohamed Altantawy
  • Nizar Habash
  • Owen Rambow
  • Ibrahim Saleh
چکیده

MAGEAD is a morphological analyzer and generator for Modern Standard Arabic (MSA) and its dialects. We introduced MAGEAD in previous work with an implementation of MSA and Levantine Arabic verbs. In this paper, we port that system to MSA nominals (nouns and adjectives), which are far more complex to model than verbs. Our system is a functional morphological analyzer and generator, i.e., it analyzes to and generates from a representation consisting of a lexeme and linguistic feature-value pairs, where the features are syntactically (and perhaps semantically) meaningful, rather than just morphologically. A detailed evaluation of the current implementation comparing it to a commonly used morphological analyzer shows that it has good morphological coverage with precision and recall scores in the 90s. An error analysis reveals that the majority of recall and precision errors are problems in the gold standard or a result of the discrepancy between different models of form-based/functional morphology. 1. Goal of This Paper In previous work, we have presented MAGEAD, a morphological analyzer and generator for Modern Standard Arabic (MSA) verbs, and we have extended that work to cover Levantine Arabic as well (Habash et al., 2005; Habash and Rambow, 2006). In this paper, we port that system to MSA nominals (nouns and adjectives). Our system is a functional morphological analyzer and generator, i.e., it analyzes to and generates from a representation consisting of a lexeme and linguistic feature-value pairs, where the features are syntactically (and perhaps semantically) meaningful, rather than just morphologically. In this perspective, nouns turn out to be far more complex than verbs. This is because all variants (MSA and dialects) of Arabic have many “broken plurals” (irregular plurals), which are very common, and irregular feminine forms, which are less common. Furthermore, the same surface morpheme can have different morphological functions depending on context. For example, the morpheme Ta-Marbuta ( è+ +h̄),1 usually associated with the feminine singular (as in èQm. … šjrh̄ ‘tree’), can appear on the plural form of certain masculine nouns (as in éÒ ¢  @ ÂnĎmh̄ ‘systems’). This discrepancy between the surface form-based morphology and the functional morphology has only recently been addressed in depth in a computational system – see Smrž (2007)’s transformation of the form-based Buckwalter morphological analyzer (Buckwalter, 2004). This paper differs from (Smrž, 2007) in that we use “deep” morphemes throughout, i.e., our system includes both a model of roots, patterns, and morphophonemic/orthographic rules, and a complete functional account of morphology. Because of the prevalence of irregular inflectional forms among Arabic nominals, the lexicon plays a very important All Arabic transliterations are provided in the Habash-SoudiBuckwalter transliteration scheme (Habash et al., 2007). role in a functional morphological analyzer or generator for Arabic: we need to be able to relate irregular forms to their lexemes, and this can only be done with a lexicon. In this paper, we do not present work on a lexicon, and concentrate on the computation of morphology instead, including the interface to the lexicon. Our evaluation aims at measuring performance on words which are in our lexicon, not the lexicon itself. Future work will address the crucial issue of creating and evaluating a comprehensive lexicon. This paper is structured as follows. We present the relevant linguistic facts in more detail in Section 2. We compare our work to related work in Section 3. The computational machinery is presented in Section 4. We present the morphological behavior class hierarchy (the interface to the lexicon) in Section 5. Morphophonemic rules are presented in Section 6. We give an evaluation of MAGEAD in Section 7. 2. Overview of Arabic Nominal Morphology Arabic is a morphologically rich and complex language. For nominals, the inflectional variants are as follows: • Number: singular, dual, plural. Some lexemes only have a singular form, such as ÉÖ ß nml ‘ants as a collective’. • Gender: masculine, feminine. Note that only some lexemes (those nouns denoting types of humans, such as I. KA¿ kAtib ‘writers’, and all adjectives) show inflection for gender; however, if the lexeme does not inflect for gender (for example à X @ Âuðun ‘ear’), it has an inherent gender (in this case, feminine). • Case: nominative, accusative, genitive. • State: definite, indefinite, construct. We follow Fischer (2001) in his analysis of the morphological determination system for MSA. State is expressed as a suffix and should not be confused with the presence of the +È@ Al+ definite determiner; however, there is an interaction: state cannot be indefinite in the presence

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Influence of Morphemic Analysis on Vocabulary Learning Among Palestinian 10th Graders

The aim of this study is to identify the influence of morphological analysis strategy employed by Palestinian 10th grade-female students in guessing and manipulating complex words in addition to using these words in meaningful sentences. This study involved 75 female students from Idna Secondary School for Girls at Hebron governorate. The sample of the study was assigned to control group (37 st...

متن کامل

An Analysis of Persian‌ Compound Nouns as Constructions

In Construction Morphology (CM), a compound is treated as a construction at the word level with a systematic correlation between its form and meaning, in the sense that any change in the form is accompanied by a change in the meaning. Compound words are coined by compounding templates which are called abstract schemas in CM. These abstract constructional schemas generalize over sets of existing...

متن کامل

Realizations and Functional Patterns of Shell Nouns in Applied Linguistics Research Articles

This study intends to investigate the realizations and functional patterns of shell nouns in Applied Linguistics research articles.  To this end, fifty research articles in the field of Applied Linguistic were selected from Journal of English for Academic Purposes and journal of English for Specific Purposes published by Elsevier.  The articles were analyzed for the realizatio...

متن کامل

Structure, form, and meaning in the mental lexicon: evidence from Arabic

Does the organization of the mental lexicon reflect the combination of abstract underlying morphemic units or the concatenation of word-level phonological units? We address these fundamental issues in Arabic, a Semitic language where every surface form is potentially analyzable into abstract morphemic units - the word pattern and the root - and where this view contrasts with stem-based approach...

متن کامل

Developing a New System for Arabic Morphological Analysis and Generation

Arabic morphology poses special challenges to computational natural language processing systems. Its rich morphology and the highly complex word formation process of roots and patterns make computational approaches to Arabic very challenging. In this paper we present an approach for morphological analysis and generation of Modern Standard Arabic (MSA). Our approach is based on Arabic morphologi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010